Apparatus, systems, and methods for crowdsourcing domain specific intelligence

ABSTRACT

The present disclosure provides apparatus, systems, and methods for crowdsourcing domain specific intelligence. The disclosed crowdsourcing mechanism can receive domain specific intelligence as a data processing rule module. For example, a data analytics system can request a crowd of software developers to provide a data processing rule module tailored to process a particular type of information from a particular domain. When the data analytics system receives the data processing rule module from one of the software developers for the particular domain, the data analytics system can use the received data processing rule module to process information associated with the particular domain.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims benefit of the earlier filing date, under 35U.S.C. § 119(e), of:

-   -   U.S. Provisional Application No. 61/799,986, filed on Mar. 15,        2013, entitled “SYSTEM FOR ANALYZING AND USING LOCATION BASED        BEHAVIOR”;    -   U.S. Provisional Application No. 61/800,036, filed on Mar. 15,        2013, entitled “GEOGRAPHIC LOCATION DESCRIPTOR AND LINKER”;    -   U.S. Provisional Application No. 61/799,131, filed on Mar. 15,        2013, entitled “SYSTEM AND METHOD FOR CROWD SOURCING DOMAIN        SPECIFIC INTELLIGENCE”;    -   U.S. Provisional Application No. 61/799,846, filed Mar. 15,        2013, entitled “SYSTEM WITH BATCH AND REAL TIME DATA        PROCESSING”; and    -   U.S. Provisional Application No. 61/799,817, filed on Mar. 15,        2013, entitled “SYSTEM FOR ASSIGNING SCORES TO LOCATION        ENTITIES”.

This application is also related to:

-   -   U.S. patent application Ser. No. 14/214,208, filed on Mar. 14,        2014, entitled “APPARATUS, SYSTEMS, AND METHODS FOR ANALYZING        MOVEMENTS OF TARGET ENTITIES,”;    -   U.S. patent application Ser. No. 14/214,296, filed Mar. 14,        2014, entitled “APPARATUS, SYSTEMS, AND METHODS FOR PROVIDING        LOCATION INFORMATION,”;    -   U.S. patent application Ser. No. 14/214,219, filed on Mar. 14,        2014, entitled “APPARATUS, SYSTEMS, AND METHODS FOR BATCH AND        REALTIME DATA PROCESSING,”;    -   U.S. patent application Ser. No. 14/214,309, filed on Mar. 14,        2014, entitled “APPARATUS, SYSTEMS, AND METHODS FOR ANALYZING        CHARACTERISTICS OF ENTITIES OF INTEREST,”; and    -   U.S. patent application Ser. No. 14/214,231, filed on Mar. 14,        2014, entitled “APPARATUS, SYSTEMS, AND METHODS FOR GROUPING        DATA RECORDS,”.

The entire content of each of the above-referenced applications(including both the provisional applications and the non-provisionalapplications) is herein incorporated by reference.

FIELD OF THE INVENTION

The present disclosure generally relates to systems and methods forcrowdsourcing domain specific intelligence.

BACKGROUND

A large amount of information is created every day. Social networkingsites and blogging sites receive millions of new postings every day, andnew webpages are constantly being created to provide information about aperson, a landmark, a business, or any other entities that people areinterested in. Furthermore, the information is usually not availablefrom a single repository, but is usually distributed across millions ofrepositories, often located around the world.

Because of the sheer volume and the distributed nature of information,it is difficult for people to consume information efficiently. Toaddress this issue, data analytics systems can (1) gather theinformation using a crawler and (2) create a meaningful summary of theinformation so that the information can be consumed easily.

To create such a meaningful summary, the data analytics system oftenpre-processes (or cleans) the information to detect (e.g. find oranchor) and retrieve (e.g., extract) relevant data from the gatheredinformation. To this end, the data analytics system can use a dataprocessing module to search for data having known formats or structures.Unfortunately, data in certain domains can be formatted or structured ina non-conventional manner. Therefore, the data processing module has tobe tailored to the particular domain using domain specific intelligenceso that the data processing module can detect relevant data from thelarge amount of information.

Unfortunately, a single software programmer may not have the domainspecific intelligence nor the capacity to adequately tailor the dataprocessing module to all domains of interest. Therefore, there is a needfor an effective mechanism for providing domain specific intelligence tothe data processing module.

SUMMARY

In general, in an aspect, embodiments of the disclosed subject mattercan include an apparatus. The apparatus is configured to crowdsourcedomain specific intelligence from a plurality of persons. The apparatuscan include one or more interfaces configured to provide communicationwith a first plurality of computing devices and a second plurality ofcomputing devices, wherein one of the first plurality of computingdevices is operated by one of the plurality of persons having knowledgeof a particular domain. The apparatus can also include a processor, incommunication with the one or more interfaces, and configured to run oneor more modules. The one or more module are operable to cause theapparatus to receive a plurality of data processing rule (DPR) modulesfrom the first plurality of computing devices, wherein one of theplurality of DPR modules is tailored for use in a particular domain, andthe one of the plurality of DPR modules is provided by one of theplurality of persons based on the knowledge of the particular domain;and group the plurality of DPR modules into a first DPR module packageto provide the knowledge of the particular domain as a package.

In general, in an aspect, embodiments of the disclosed subject mattercan include a method for crowdsourcing domain specific intelligence froma plurality of persons. The method can include providing, by one or moreinterfaces in an apparatus, communication with a first plurality ofcomputing devices and a second plurality of computing devices, whereinone of the first plurality of computing devices is configured to beoperated by one of the plurality of persons having knowledge of aparticular domain; receiving, at a data processing rule crowdsourcing(DPRC) module in the apparatus, a plurality of data processing rule(DPR) modules from the first plurality of computing devices, wherein oneof the plurality of DPR modules is tailored for use in a particulardomain, and one of the plurality of DPR modules is provided by one ofthe plurality of persons based on the knowledge of the particulardomain; an grouping the plurality of DPR modules into a first DPR modulepackage to provide the knowledge of the particular domain as a package.

In general, in an aspect, embodiments of the disclosed subject mattercan include a non-transitory computer readable medium. Thenon-transitory computer readable medium can include executableinstructions operable to cause a data processing apparatus to provide,by one or more interfaces in the apparatus, communication with a firstplurality of computing devices and a second plurality of computingdevices, wherein one of the first plurality of computing devices isconfigured to be operated by one of the plurality of persons havingknowledge of a particular domain; receive, at a data processing rulecrowdsourcing (DPRC) module in the apparatus, a plurality of dataprocessing rule (DPR) modules from the first plurality of computingdevices, wherein one of the plurality of DPR modules is tailored for usein a particular domain, and one of the plurality of DPR modules isprovided by one of the plurality of persons based on the knowledge ofthe particular domain; and group the plurality of DPR modules into afirst DPR module package to provide the knowledge of the particulardomain as a package.

In any one of the embodiments disclosed herein, the apparatus, themethod, or the non-transitory computer readable medium can includemodules, steps, or executable instructions for sending a DPR modulerequest, to the second plurality of computing devices, requesting thesecond plurality of computing devices to provide a DPR module for apredetermined domain, wherein the DPR module request includesinformation indicative of functional requirements of the requested DPRmodule.

In any one of the embodiments disclosed herein, the apparatus, themethod, or the non-transitory computer readable medium can includemodules, steps, or executable instructions for receiving the requestedDPR module from one of the second plurality of computing devices and todetermine that the received DPR module satisfies the functionalrequirements.

In any one of the embodiments disclosed herein, the apparatus, themethod, or the non-transitory computer readable medium can includemodules, steps, or executable instructions for receiving the requestedDPR module from one of the second plurality of computing devices,wherein the one of the second plurality of computing devices isconfigured to determine that the DPR module received by the apparatussatisfies the functional requirements.

In any one of the embodiments disclosed herein, the plurality of DPRmodules is configured to operate on a virtual machine.

In any one of the embodiments disclosed herein, the plurality of DPRmodules is configured to operate on a system capable of running machinecode compiled from two or more languages.

In any one of the embodiments disclosed herein, the apparatus, themethod, or the non-transitory computer readable medium can includemodules, steps, or executable instructions for sending the first DPRmodule package to a server in communication with the apparatus for useat the server.

In any one of the embodiments disclosed herein, one of the plurality ofDPR modules is configured to call a DPR module in a second DPR modulepackage, and the apparatus, the method, or the non-transitory computerreadable medium can further include modules, steps, or executableinstructions for maintaining a dependency between the first DPR modulepackage and the second DPR module package.

In any one of the embodiments disclosed herein, the apparatus, themethod, or the non-transitory computer readable medium can includemodules, steps, or executable instructions for sending, in addition tothe first DPR module package, the second DPR module package to theserver.

In any one of the embodiments disclosed herein, the apparatus, themethod, or the non-transitory computer readable medium can includemodules, steps, or executable instructions for maintaining a resource,and one of the plurality of DPR modules is configured to use theresource to provide a context-aware functionality.

In any one of the embodiments disclosed herein, the apparatus, themethod, or the non-transitory computer readable medium can includemodules, steps, or executable instructions for providing an applicationprogramming interface (API) to enable an external system to use one ofthe plurality of DPR modules maintained by the apparatus.

DESCRIPTION OF THE FIGURES

Various objects, features, and advantages of the present disclosure canbe more fully appreciated with reference to the following detaileddescription when considered in connection with the following drawings,in which like reference numerals identify like elements. The followingdrawings are for the purpose of illustration only and are not intendedto be limiting of the disclosed subject matter, the scope of which isset forth in the claims that follow.

FIG. 1 illustrates a data analytics system in accordance with someembodiments.

FIG. 2 illustrates a process for gathering data processing rule (DPR)modules in accordance with some embodiments.

FIG. 3 illustrates an exemplary DPR module in accordance with someembodiments.

FIG. 4 illustrates a tree structured dependency of packages inaccordance with some embodiments.

FIG. 5 illustrates a relationship between components of the DP engine inaccordance with some embodiments.

FIG. 6 illustrates a process for instantiating a universe to call a DPRmodule in a package in accordance with some embodiments.

DESCRIPTION OF THE DISCLOSED SUBJECT MATTER

To process information from a particular domain, a data analytics systemmay use intelligence specific to that particular domain. For example, adata analytics system may receive a web page that includes phone numbersformatted in accordance with the Italian standard. According to theItalian standard, all landline phone numbers begin with a “4”, whereasall mobile phone numbers begin with a “3”. Unless the data analyticssystem is aware of such domain specific intelligence, the data analyticssystem may not be able to adequately process the Italian phone numbersto determine whether a phone number is a landline number or a mobilephone number.

In some cases, such domain specific intelligence can be provided to thedata analytics system as a data processing rule module. The dataprocessing rule module can include instructions that are operable detectinformation having a predetermined format.

In some cases, the data processing rule module can be provided by asingle person. However, when there are many domains from which theinformation can be received, a single person may not be able to builddata processing rule modules for all domains of interest. Even if theperson could learn all domain-specific rules and build data processingrule modules for all domains of interest, this may not be the mostefficient use of the person's time.

The present disclosure provides apparatus, systems, and methods forcrowdsourcing domain specific intelligence. Because the data analyticssystem can receive the domain specific intelligence as a data processingrule module, the data analytics system can request a crowd of softwaredevelopers or other individuals capable of learning a domain specificlanguage that can express simplified rules for expressing domainspecific knowledge to provide a data processing rule module tailored toprocess a particular type of information from a particular domain. Whenthe data analytics system receives the data processing rule module fromone of the software developers for the particular domain, the dataanalytics system can use the received data processing rule module toprocess information known to be associated with the particular domain.The disclosed crowdsourcing mechanism can facilitate a collaboration ofsoftware developers from a variety of domains by providing, to softwaredevelopers, various pieces of a large problem. The disclosedcrowdsourcing mechanism can be used within a single organization byrequesting software developers of the same organization to providedomain-specific data processing rule modules.

When referring to the domain specific intelligence, a domain can referto an area of knowledge or activity. For example, a domain can include ageographical area (e.g., Europe), a field of expertise (e.g., computerscience, law), an application with distinct data types (e.g., Italianphone number system), information about video games, subjects that tendto have topic specific slang or dialects, or any area of knowledge ofactivity from which data can be gathered.

In some embodiments, the data processing rule module can be configuredto identify domain specific information that is formatted in accordancewith a particular domain. For example, in Italy, a telephone number isrepresented by 6, 7, or 8 consecutive numbers (e.g., XXXXXXXX, where Xindicates a digit), whereas in the US, a telephone number is representedwith three digits followed by four digits (e.g., XXX-XXXX, where Xindicates a digit). Therefore, a data processing rule module fordetecting an Italian telephone number can be configured to search for 6,7, or 8 consecutive numbers.

In some embodiments, the data processing rule module can be configuredto identify domain specific information whose value has a particularmeaning in a particular domain. Referring back to the Italian phonenumber example, the data processing rule module specific to the Italianphone number system can include a rule that, when a phone number beginswith a “4”, the phone number is a landline number; and that when a phonenumber begins with a “3”, the phone number is a mobile phone number.

In some embodiments, a data processing system includes two subsystems:one or more data processing rule modules and a data processing engine.The data processing engine can be configured to receive information froma data source, such as a web page, and to detect domain specificinformation from the received information. To this end, once the dataprocessing engine receives information from the data source, the dataprocessing engine is configured to use one or more data processing rulemodules to detect the domain specific information. Subsequently, thedata processing engine can use the detected domain specific informationto identify meaningful features of the received information.

In some embodiments, a data processing rule module can be dynamic. Forexample, the data processing module can be easily modified, replaced, orremoved from the data processing system. In some sense, the dataprocessing rule module can be considered to be an expression of a datatype. In contrast to the data processing rule module, the dataprocessing engine can be static. The data processing engine can form abackbone of the data processing system, and may not be easily modified,replaced, or removed from the data processing system.

In some embodiments, the data processing rule module received from thecrowd of software developers can be implemented in a language that canbe operated on a virtual machine. For example, the data processing rulemodule can be implemented in a variety of programming languages,including one or more of Java, Lisp, Clojure, JRuby, Scala, orJavaScript languages, and can operate in a Java Virtual Machine (JVM)using JVM's interface, for example, to JRuby and Lisp functions.

In some embodiments, the data processing rule module received from thecrowd of software developers can be implemented in a language that canbe accommodated by a system capable of compiling different languagesinto the same type of machine code or have multi-language properties.For example, the data processing rule module can be implemented in avariety of programming languages that can be accommodated by a CommonLanguage Runtime (CLR), developed by MICROSOFT CORPORATION of Redmond,Wash. The CLR provides a machine environment (e.g., an operatingplatform) that is capable of running machine code compiled from two ormore programming languages. As another example, the data processing rulemodule can be implemented in the python language and the C language,which may be accommodated together by cython.

The data analytics system can crowd-source data processing rule modulesfrom a plurality of software developers using a data processing rulecrowdsourcing module. The data processing rule crowdsourcing module isconfigured to receive or determine a specification for a data processingrule module, and send a data processing rule module request, whichincludes the specification, to a plurality of client devices at which asoftware developer is operating. When a software developer receives thedata processing rule module request, the software developer can useher/his domain expertise to develop the requested data processing rulemodule, and provide the requested data processing rule module to theclient device. Then the data processing rule module can provide therequested data processing rule module to the data processing rulecrowdsourcing module, thereby completing the transaction.

In some cases, before sending the data processing rule module to thedata processing rule crowdsourcing module, the client device can locallytest the data processing rule module to determine whether the receiveddata processing rule module satisfies the specification of the requesteddata processing rule module. If the received data processing rule moduledoes not satisfy the specification, then the client device can provide awarning signal, and request the software developer to provide a reviseddata processing rule module. In other cases, when the client device doesnot perform such local test, the data processing rule crowdsourcingmodule can be configured to determine whether the received dataprocessing rule module satisfies the specification of the requested dataprocessing rule module.

In some embodiments, the data processing rule crowdsourcing module cansend the data processing rule module request to a plurality of clientdevices using a crowdsourcing platform. For example, the data processingrule crowdsourcing module can use Amazon Mechanical Turk to send thedata processing rule module request to a plurality of softwaredevelopers. As another example, the data processing rule crowdsourcingmodule can use an enterprise network to send the data processing rulemodule request to a plurality of software developers within the sameorganization.

FIG. 1 illustrates a data analytics system in accordance with someembodiments. The data analytics system 100 includes a host device 102, acommunication network 104, and one or more client devices 106. The hostdevice 102 can include a processor 108, a memory device 110, a dataprocessing rule crowdsourcing (DPRC) module 112, a data processing (DP)engine 114, and one or more interfaces 116.

The processor 108 of the host device 102 can be implemented in hardware.The processor 108 can include an application specific integrated circuit(ASIC), programmable logic array (PLA), digital signal processor (DSP),field programmable gate array (FPGA), or any other integrated circuit.The processor 108 can also include one or more of any other applicableprocessors, such as a system-on-a-chip that combines one or more of aCPU, an application processor, and flash memory, or a reducedinstruction set computing (RISC) processor. The memory device 110 of theprocessor 108 can include a computer readable medium, flash memory, amagnetic disk drive, an optical drive, a programmable read-only memory(PROM), and/or a read-only memory (ROM).

The DPRC module 112 is configured to coordinate the crowdsourcing ofdata processing rule (DPR) modules that are tailored to particularapplication domains. For example, the DPRC module 112 is configured torequest one or more clients 106 to provide one or more DPR modules, andto receive, from the one or more clients 106, the requested DPR modules.The DPRC module 112 can subsequently provide the requested DPR modulesto the DP engine 114.

The DP engine 114 can be configured to receive (1) DPR modules from theDPRC module 112 and (2) information from a variety of data sources, andprocess the received information using the one or more DPR modules toprovide a feature of the received information. The DP engine 114 can beconfigured to operate a virtual machine, such as a Java Virtual Machine(JVM), that can interface with the one or more DPR modules. For example,the virtual machine can interface with DPR modules implemented using oneof Java, Lisp, Clojure, JRuby, and JavaScript languages. The DP engine114 can also be configured to operate a system capable of compilingdifferent languages into the same type of machine code or havemulti-language properties. For example, the DP engine 114 can operate aCommon Language Runtime, developed by MICROSOFT CORPORATION of Redmond,Wash. As another example, the DP engine 114 can operate cython, whichcan accommodate the python language and the C language together.

In some embodiments, the DPRC module 112, the DP engine 114, and/or oneor more DPR modules can be implemented in software stored in thenon-transitory memory device 110, such as a non-transitory computerreadable medium. The software stored in the memory device 110 can run onthe processor 108 capable of executing computer instructions or computercode.

In some embodiments, one or more of the DPRC module 112, the DP engine114, and/or one or more DPR module can be implemented in hardware usingan ASIC, PLA, DSP, FPGA, or any other integrated circuit. In someembodiments, one or more of the DPRC module 112, the DP engine 114,and/or one or more DPR module can both be implemented on the sameintegrated circuit, such as ASIC, PLA, DSP, or FPGA, thereby forming asystem on chip.

The host device 102 can include one or more interfaces 116. The one ormore interfaces 116 provide a communication mechanism to communicateinternal to, and external to, the host device 102. For example, the oneor more interfaces 116 enable communication with clients 106 over thecommunication network 104. The one or more interfaces 116 can alsoprovide an application programming interface (API) to other hostdevices, or computers coupled to the network 104 so that the host device102 can receive location information, such as geo-location coordinates.The one or more interfaces 116 are implemented in hardware to send andreceive signals in a variety of mediums, such as optical, copper, andwireless, and in a number of different protocols some of which may benon-transitory.

In some embodiments, the host device 102 can reside in a data center andform a node in a cloud computing infrastructure. The host device 102 canalso provide services on demand. A module hosting a client is capable ofmigrating from one host device to another host device seamlessly,without causing program faults or system breakdown. The host device 102on the cloud can be managed using a management system. Although FIG. 1represents the host device 102 as a single device, the host device 102can include more than one device.

The client 106 can include any platforms capable of computations.Non-limiting examples can include a computer, such as a desktopcomputer, a mobile computer, a tablet computer, a netbook, a laptop, aserver, a tablet computer, a cellular device, or any other computingdevices having a processor and memory and any equipment with computationcapabilities. The client 106 is configured with one or more processorsthat process instructions and run software that may be stored in memory.The processor also communicates with the memory and interfaces tocommunicate with other devices. The processor can be any applicableprocessor such as a system-on-a-chip that combines a CPU, an applicationprocessor, and flash memory. The client 106 can also provide a varietyof user interfaces such as a keyboard, a touch screen, a trackball, atouch pad, and/or a mouse. The client 106 may also include speakers anda display device in some embodiments.

In some embodiments, the host device 102 can communicate with clients106 directly, for example via a software application programminginterface (API). In other embodiments, the host device 102 and the oneor more client devices 106 can communicate via the communication network104.

The communication network 104 can include the Internet, a cellularnetwork, a telephone network, a computer network, a packet switchingnetwork, a line switching network, a local area network (LAN), a widearea network (WAN), a global area network, or any number of privatenetworks currently referred to as an Intranet, and/or any other networkor combination of networks that can accommodate data communication. Suchnetworks may be implemented with any number of hardware and softwarecomponents, transmission media and network protocols. Although FIG. 1represents the network 104 as a single network, the network 104 caninclude multiple interconnected networks listed above.

The apparatus, systems, and methods disclosed herein are useful forcrowdsourcing domain specific intelligence. As an example, a web-basedsystem to respond to user queries may utilize DPR modules to processuser queries. The development of the DPR modules may benefit from domainspecific knowledge. As such, there is a need for apparatus, systems, andmethod for crowdsourcing domain specific knowledge.

By way of example, a system might use DPR modules for interpreting avariety of queries about information specific to a variety of countries.For example, addresses in different countries use different formats forstreet addresses, and as such, different DPR modules may be used forprocessing queries regarding addresses in different countries.Additionally, individuals in the United States, for example, may be morefamiliar with street address formats, phone number formats, and otherconventions specific to the United States, whereas individuals in Italy,for example, may be more familiar with conventions specific to Italy. Inthis example, it would be beneficial for individuals with more domainspecific knowledge of the United States to develop DPR modules forprocessing queries related to the United States, and for individualswith more domain specific knowledge of Italy to develop DPR modules forprocessing queries related to Italy.

Additionally, it is desirable to accumulate many such “micro” DPRmodules into a larger set of knowledge to govern complex systems.

Additionally, it is desirable to be able to combine the results ofdistributed tasks with an existing system without updating the entiresystem. For example, in a DPR module database, it is desirable forindividuals with knowledge about a specific domain, for example China,to be able to develop DPR modules specific to that domain, and to beable to combine those DPR modules with an existing system for respondingto queries.

Accordingly, it is desirable to be able to distribute tasks betweenindividuals with domain specific knowledge relevant to certain tasks,and then to be able to combine the results with an existing system andwith the results of other distributed tasks.

In some embodiments, the disclosed apparatus, systems, and methods allowtasks to be distributed between individuals and allow for the results ofthe tasks to be combined with other results and systems.

In some embodiments, the disclosed apparatus, systems, and methods allowsoftware developers having different knowledge or expertise to writevarious pieces of a software system in one or more programminglanguages. For example, DPR modules can be implemented in one or more ofJava, Clojure, JRuby, Scala, and JavaScript languages. The disclosedapparatus, systems, and methods can beneficially allow both dataengineers and data labs teams to collaborate by working on variouspieces of a larger problem.

One of the advantages of the disclosed framework is the ability tocrowdsource domain specific intelligence. For instance, if there was aneed to parse a country's phone number, a person from the country ofinterest can program a DPR module specific to that country. Similarly,different people can write DPR modules for parsing phone numbers (orother information) for other countries. Each person can program the DPRmodules using a computer language of their preference. The programmedDPR modules can then be tested locally (e.g., each person can test theirown code) without having to work within an existing rules-based system.After testing, the team could then merge the code into the code base andmake the new DPR module(s) available to the rest of the company.

As an additional example, if a team wanted to build a database oflandmarks of the world based off of Wikipedia pages, they might need tohandle several steps. One team member might use an advanced naturallanguage processing algorithm to determine if the page was about alandmark. This person might also need someone to write a parser tofigure out name, country, city, and date built. The disclosed apparatus,systems, and methods provide ways for these steps to be done bydifferent people using the language that they're most comfortable withor that lends itself well for the task. A team could even outsourceother easier tasks.

The domain intelligence crowdsourcing mechanism can involve the DPRmodule gathering step, the DPR module packaging step, and the DPR moduledeployment step. The DPR module gathering step can include receiving, ata DPRC module 112, one or more DPR modules associated with one or moredomains. The DPR module packaging step can include collecting, by theDPRC module 112, the received DPR modules and packaging the DPR modulesinto a package based on the functionalities associated with the DPRmodules. The DPR module deployment step can involve processing data, ata DP engine 114, using DPR modules in the package.

FIG. 2 illustrates a process for gathering DPR modules in accordancewith some embodiments. The process 200, however, is exemplary. Theprocess 200 may be altered, e.g., by having steps added, removed, orrearranged.

In step 202, the client 106 is configured to receive a DPR moduleoperable to perform a data processing functionality in a particulardomain. The client 106 can receive the DPR module from a user of theclient 106, such as a software developer; the client 106 can receive theDPR module from another computing device over a communication network.In some embodiments, the client 106 can present, to the user of theclient 106, the functional requirements of the DPR module. For example,the client 106 can request the user to provide a DPR module that iscapable of parsing an Italian phone number and determining whether theItalian phone number is associated with a landline or a mobile device.

In step 204, once the client 106 receives the DPR module, the client 106can optionally test the functionality of the received DPR module todetermine whether the received DPR module satisfies the functionalrequirements. For example, the client 106 can run the DPR module on aknown list of Italian phone numbers to determine whether the DPR moduleis capable of identifying all Italian phone numbers having a variety offormats and is capable of correctly determining whether the phone numberis associated with a landline or a mobile device. If the received DPRmodule satisfies the functional requirements, the client 106 can betriggered to move to step 206. If the received DPR module does notsatisfy the functional requirements, the client 106 can notify the userthat the DPR module has error and that it should be revised.

In some embodiments, the client 106 can test the functionality of theDPR modules using a test module. In some cases, the client 106 canreceive the test module from the user of the client 106. In other cases,the client 106 can receive the test module from the DPRC module 112.

In step 206, the client 106 can send the DPR module to the DPRC module112 so that the DPR module can be packaged with other DPR modules into aDPR package.

In some embodiments, the DPRC module 112 can optionally cause the client106 to receive the DPR module from, for example, the user of the client106. For example, in step 208, prior to step 202, the DPRC module 112can be configured to send a DPR module request to the client 106,requesting the client 106 to provide a DPR module. The DPR modulerequest can include the functional requirements of the DPR module (e.g.,a specification of a function to be performed by the DPR module.) Forexample, the DPR module request can indicate that the DPR module shouldbe able to parse Italian phone numbers and to determine whether anItalian phone number is associated with a landline or a mobile device.The DPR module request can also indicate a list of program languagesthat can be used to implement the DPR module.

In some embodiments, the DPRC module 112 can be configured to send theDPR module request to a plurality of clients 106 and receive one or moreDPR modules from at least one of the plurality of clients 106.Subsequently, the DPRC module 112 can be configured to select, from theone or more DPR modules, the final DPR module for the function specifiedin the DPR module request. In some cases, the DPRC module 112 can beconfigured to select, as the final DPR module, the DPR module that wasfirst received by the DPRC module 112. In other cases, the DPRC module112 can be configured to select, as the final DPR module, the DPR modulethat has the lowest computational complexity or the lowest computationtime.

FIG. 3 illustrates an exemplary DPR module in accordance with someembodiments. FIG. 3 provides a class 302 within which the DPR module 304is provided. While the exemplary DPR module 304 is provided inJavaScript, a DPR module can be provided in a variety of programminglanguages including, for example, one or more of Java, Lisp, Clojure,Ruby, JRuby, Scala, or JavaScript languages.

The DPR module 304 can be provided within a class 302. The class 302 caninclude, in addition to the DPR module 304, a header. The header caninclude a class name 306 of the class 302. The class name 306 can beused to refer to functions (e.g., DPR modules) that are defined withinthe class 302. For example, the DPR module “cleanCity” 304 can bereferred to as “City#cleanCity”. The header can also include referencesto one or more packages 308 that are imported into the class 302. Forexample, the class 302 is configured to import a package (or a class)308 called “common.Cleaners.” This way, the class 302 (and any DPRmodules defined in the class 302) can use functions (e.g., DPR modules)provided in the package “common.Cleaners.” The package 308 is also knownas “dependencies”.

The exemplary DPR module “cleanCity” 304 is configured to receive a cityname as an argument “value” and to canonicalize the city name into apredetermined representation. For example, the exemplary DPR module 304is configured to convert common abbreviations of city names into a fullname, for example, NY into New York, or LA into Los Angeles.

In some embodiments, the DPR module 304 can be configured to execute(e.g., call) DPR modules from other packages without knowing theunderlying implementation of the DPR modules or the language(s) in whichthe DPR modules are written. For example, the function“$p.execute(‘common.Cleaners#trim’, value)” is configured to call theDPR module “trim” from the package “common.Cleaners”, which was importedinto this class 302. The function “$p.execute(‘common.Cleaners#trim’,value)” does not need to understand the implementation of“common.Cleaners#trim” and does not need to understand the programminglanguage in which the DPR module “trim” is implemented. In this case,the DPR module “trim” in the package “common.Cleaners” 308 is configuredto strip out all leading and trailing whitespace in the input argument“value” of the DPR module 304.

In some embodiments, the DPR module 304 can include an embedded testmodule 310. The embedded test module 310 can be executed when the testis invoked and can report an error if the test is not passed.

In some embodiments, once the DPRC module 112 receives DPR modules fromone or more clients 106, the DPRC module 112 is configured to group thereceived DPR modules into a DPR module package. In particular, the DPRCmodule 112 is configured to group DPR modules that are programmed in oneor more languages supported by the DP engine 114. In some cases, apackage can include DPR modules programmed using a single programminglanguage. For example, the DPRC module 112 can be configured to groupall DPR modules programmed in Clojure as a first package, and to groupall DPR modules programmed in JavaScript as a second package.

In some embodiments, the DPRC module 112 can be configured to maintainresources. Resources can generally include files, databases of values,indexes of information, or other data. Examples of resources include:maps, lists, a list of cities in a particular country, a set of regularexpressions for phone number variations, a mapping from abbreviations tofull names or values of cities, a set of polygons representingpostcodes, and other elements that can be referenced by the rules andprogram instructions. Such a resource could be used in a variety ofapplications. For example, the list of cities in a particular countrycan be used to reject city names that are not on the list. This allows asystem to limit values of a city attribute to those on that list. Asanother example, the set of polygons representing postcodes can be usedto check whether a particular location is actually inside the postcodeassociated with it. As another example, a map can be used to determinein which country a landmark is located based on the city's name.

In some embodiments, a resource can be utilized by DPR modules toprovide context-aware functionalities. For example, a DPR module canrefer to the resource to determine a physical location of a computingdevice on which the DPR module is operating, and the DPR module canadapt its functionality to the physical location to provide alocation-aware functionality. As another example, a DPR module isconfigured to determine a phone number in a document. The DPR module canuse the resource to determine a geographical location from which thedocument originated, or the language in which the document is written.Subsequently, the DPR module can use the geographical locationinformation or the language information to determine which one of thesub-DPR modules to use (e.g., a DPR module for extracting an Italianphone number or a US phone number) to extract phone numbers from thedocument.

In some embodiments, a package can have dependencies. For example, a DPRmodule in a first package can include a subroutine that calls a DPRmodule in a second package. As another example, a DPR module in a firstpackage can include a subroutine that uses a resource.

In some embodiments, dependencies between packages and resources can berepresented as a tree structure. FIG. 4 illustrates a tree structureddependency of packages and resources in accordance with someembodiments. The tree structured dependency 400 includes five elements:A 402, B 404, C 406, and D 408. Each of these elements can correspond toone of a package or a resource. This tree structured dependencyrepresents a scenario in which the element A 402 depends on elements B404 and C 406, and the element B 404 depends on elements C 406 and D408. The element C 406 has been duplicated for illustration purposes,but could be represented as a single element in the tree 400 byre-wiring the dependencies between A 402 and C 406. This tree structureddependency can be later flattened by the DP engine 114 or the DPRCmodule 112 to determine the order in which packages are loaded to the DPengine 114 or to another package. This process is described furtherbelow.

Once the DPRC module 112 prepares one or more DPR module packages, theDPRC module 112 can provide the packages to the DP engine 114. The DPengine 114 can subsequently use the DPR modules in the one or morepackages to process input data.

The DP engine 114 can include instructions that can instantiate one ormore of the following software components (or classes): a universe, apackage, a resource, and a DPR module. FIG. 5 illustrates a relationshipbetween components of the DP engine in accordance with some embodiments.FIG. 5 includes a universe 502, one or more packages 504A-504C, one ormore resources 506A-506B, and one or more DPR modules 508A-508C.

As described above, the packages 504 can each include one or more DPRmodules 508.

The universe 502 may represent a container or an environment withinwhich a part or all DPR modules 508 reside. Therefore, the universe 502can include DPR modules 508 that implemented in a variety of programminglanguages supported by the DP engine 114. In some embodiments, theuniverse 502 can provide the DP engine 114 with a directory of DPRmodules 508 and/or packages 504 including DPR modules 508. From the DPengine's perspective, the universe 502 may be the only way to executeDPR modules 508. For example, the DP engine 114 may not call any DPRmodules 508 in the universe 502 unless the DP engine 114 firstinstantiates the universe 502 in which the DPR modules 508 reside.

In some embodiments, the universe 502 may be the gatekeeper for externalApplication Programming Interfaces (APIs) to access or call DPR modulesresiding in the universe 502. In particular, an external program may beable to call a DPR module 508 in the universe 502 only by using an APIthat couples the external program to the DPR module 508 in the universe502. In the exemplary embodiment of FIG. 5, the DPR module “packageA#DPRModule 1” can be called through the universe 502 by an external Javaprogram or by a DPR module 508 residing internally in the DPR package504 or the resource 506.

FIG. 6 illustrates a process for instantiating a universe to call a DPRmodule in a package in accordance with some embodiments. The process600, however, is exemplary. The process 600 may be altered, e.g., byhaving steps added, removed, or rearranged.

In step, 602, the DP engine 114 can use a header to import one or moreDPR module packages that include the desired DPR modules. In this case,the DP engine 114 imports (1) packageA.City so that the DP engine canuse DPR modules in packageA.City and (2) packageA.State so that the DPengine can use DPR modules in packageA.State.

In some embodiments, a plurality of DPR module packages imported by theDP engine 114 can have dependencies between them. The DP engine 114 canbe configured to take the dependencies into account to determine theorder in which the plurality of DPR module packages is loaded onto theDP engine 114. In particular, when the dependencies are represented as atree structure, the DP engine 114 can be configured to flatten thedependencies so that leaves of the tree structure can be loaded prior tothe root of the tree structure.

For example, if the DP engine 114 is configured to import DPR modulepackages having the dependencies of FIG. 3, the DP engine can beconfigured to load the DPR module packages in the following order:D→C→B→A. This loading order can ensure that all of A's dependencies areloaded onto the DP engine 114 before A.

In step 604, the DP engine 114 can instantiate a software object that iscapable of accessing a package having the desired DPR module. Forexample, the DP engine 114 can instantiate an object of the classScarecrow that is capable of accessing the package “packageA.City”.

In step 606, the DP engine 114 can instantiate a software object of theclass Universe that is associated with the Scarecrow object from step604. For example, the DP engine 114 can instantiate an object of theUniverse class in which the package packageA.City resides.

In step 608, the DP engine 114 can call the desired DPR module throughthe software object of the Universe class instantiated in step 606. Forexample, the software object can use a call-back function to call thecleanCity module 612 in the package packageA.City. The call-backfunction can be used to provide an argument for the cleanCity module 612as well.

In some embodiments, a DPR module 112 can be wrapped by a Java object tobe proxied by a common interface (e.g., a common set of input argumentsand output values). More particularly, the common interface can receivea request, optionally alter it, make the request to the underlying Javaobject that is wrapping the DPR module 112. Subsequently, the commoninterface can receive a response from the Java object, optionally handlethe retries or exceptions, optionally alter the response, and return theresponse to the requester that sent the request to the common interface.This allows the DP engine 114 to use DPR modules that may be programmedin different programming languages. For example, in step 610, the DPengine 114 can be configured to run the DPR module“packageA.State#fromCity” 614 implemented in JRuby, not the JavaScriptused to implement the DPR module “packageA.City#cleanCity” 612. To thisend, the DPR modules can be executed (e.g., called) by the Java VirtualMachine (JVM) using one or more layers, and more particularly, theruntime data area (a layer of the JVM). For example, the runtime dataarea can provide a function area (e.g., method area) which is shared bymultiple threads running in the JVM. This enables functions (e.g.,methods) in different languages to be called by the JVM (or any otherprograms being executed on the JVM) since the functions are in thecommon area that is accessible by multiple threads.

For example, the DPR module “cleanCity” 612 written in JavaScript iscalled from “packageA” but abides by an interface that a DPR moduletakes in a String as an argument and returns a value that is casted to aString. The DPR module “fromCity” 614, also in “packageA,” could havebeen written in another language, such as Clojure, by another user ofthe system. Though the two DPR modules are programmed in differentlanguages and have been implemented independently, the two DPR modulesare able to interact. This allows the DP engine 114 to share and castsupported data types (e.g., String, Boolean, Numbers) to differentlanguage environments to allow them to share functionality.

In some embodiments, the DP engine 114 can instantiate an environmentfor a particular programming language, such as methods or resources, ona need basis. This is called a lazy instantiation of environments. Lazyinstantiation means that objects are not created or loaded until theyare used. For example, by step 608 of FIG. 6, the DP engine 114 executesonly DPR modules programmed in JavaScript. Therefore, by step 608, theDP engine 114 does not need to instantiate environments for supportingother programming languages, such as Clojure and JRuby. In step 610,however, if the DPR module “packageA.State#fromCity” 614 is implementedin JRuby, the DP engine 114 can instantiate the environment for JRubyprior to executing “packageA.StateffromCity”. This way, the DP engine114 can instantiate the environments only when there is a need toinstantiate the environments. As another example, until a programrequests the system to process an Italian phone number, the method andresources for processing Italian phone number are not loaded onto thesystem, and, therefore, do not occupy space in memory.

In some embodiments, the lazy instantiation of environments can beaccomplished, in part, by creating a reentrant object. A reentrantobject is an object that can be safely called while the object is in themiddle of processing because it either doesn't have an internal state orit properly handles states such that interruptions don't leave stateinconsistent. In the example of processing an Italian phone number, theprocess of loading the phone number method does not replace the existingmethod until the loading process has completed successfully. Once thereentrant object is created, the DPR modules in the rules file can beevaluated (e.g., invoked) such that they become member functions of thereentrant object with appropriately isolated namespaces so that methodswith the same name can be appropriately isolated. For example, by usingappropriately isolated namespaces, the method “poi.Italy.phone_number”does not conflict with “poi.USA.phone_number”.

In some embodiments, the process 600 can be incorporated into a DPRmodule. For example, the steps 604-610 can be nested as a DPR module, asdisclosed in FIG. 3, and the DPR module can be grouped into a package,as disclosed above. When the DPR module corresponding to the process 600is called by a DP engine 114, the DP engine 114 is configured to executethe process 600 as described above.

In some embodiments, the DPRC module 112 is configured to update a dataprocessing rule package. In some cases, the DPRC module 112 isconfigured to update the package in batch, for example, when the DPRCmodule 112 receives a predetermined number of new or updated DPRmodules. In some cases, the DPRC module 112 is configured to update thepackage in substantially real time, for example, when the DPRC module112 receives a new DPR module. In some cases, the DPRC module 112 isconfigured to update the package periodically, for example, after apredetermined period of time. For example, a client 106 may contribute,to the DPRC module 112, a new DPR module that is configured to identifyItalian phone numbers. A DPRC module 112 can check for new DPR modulesand/or updates to the existing DPR modules. Once the rebuild criterionis met (e.g., that a predetermined number of new or updated DPR moduleshas been received, that a single new or updated DPR module has beenreceived, or that a predetermined amount of time has passed since thelast update of the package), the DPRC module 112 can rebuild the DPRmodule package with the new or updated DPR modules. After the rebuild,any new input data received by the host device 102 can be evaluatedusing the new or updated DPR modules and thus can detect Italian phonenumbers.

In some embodiments, the host device 102 support the use of the same ordifferent DPR modules individually or in combination within a largemulti-device batch processing pipeline and real-time server applicationswhere the host device 102 can respond to user actions or new incrementalcontributions of DPR modules.

In some embodiments, the host device 102 can be configured to retrievethe latest DPR modules or, alternatively, a specific version of DPRmodules, and use them to process previously-received input data.Referring to the Italian phone number example above, thepreviously-received input data could be re-processed with the newlyadded DPR modules or a specific version of the DPR modules and the hostdevice can use the newly added DPR modules or the specific version ofthe DPR modules to recognize Italian phone numbers in previouslyexamined and new web pages or user queries.

In some embodiments, the host device 102 can be configured to distributeDPR rule packages or individual DPR rules to other computing devices,including, for example, the client 106 or other servers in communicationwith the host device 102. For example, a server in communication withthe host device 102 can request the host device 102 to provide aparticular DPR module package, and the host device 102 can, in response,determine package dependencies for using the particular DPR modulepackage. Then, the host device 102 can provide, to the requestingserver, the particular DPR module package and any other DPR modulepackages on which the particular DPR module package depend on.

Other embodiments are within the scope and spirit of the disclosedsubject matter.

The subject matter described herein can be implemented in digitalelectronic circuitry, or in computer software, firmware, or hardware,including the structural means disclosed in this specification andstructural equivalents thereof, or in combinations of them. The subjectmatter described herein can be implemented as one or more computerprogram products, such as one or more computer programs tangiblyembodied in an information carrier (e.g., in a machine-readable storagedevice), or embodied in a propagated signal, for execution by, or tocontrol the operation of, data processing apparatus (e.g., aprogrammable processor, a computer, or multiple computers). A computerprogram (also known as a program, software, software application, orcode) can be written in any form of programming language, includingcompiled or interpreted languages, and it can be deployed in any form,including as a stand-alone program or as a module, component,subroutine, or other unit suitable for use in a computing environment. Acomputer program does not necessarily correspond to a file. A programcan be stored in a portion of a file that holds other programs or data,in a single file dedicated to the program in question, or in multiplecoordinated files (e.g., files that store one or more modules,sub-programs, or portions of code). A computer program can be deployedto be executed on one computer or on multiple computers at one site ordistributed across multiple sites and interconnected by a communicationnetwork.

The processes and logic flows described in this specification, includingthe method steps of the subject matter described herein, can beperformed by one or more programmable processors executing one or morecomputer programs to perform functions of the subject matter describedherein by operating on input data and generating output. The processesand logic flows can also be performed by, and apparatus of the subjectmatter described herein can be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application-specific integrated circuit).

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors, andany one or more processor of any kind of digital computer. Generally, aprocessor will receive instructions and data from a read-only memory ora random access memory or both. The essential elements of a computer area processor for executing instructions and one or more memory devicesfor storing instructions and data. Generally, a computer will alsoinclude, or be operatively coupled to receive data from or transfer datato, or both, one or more mass storage devices for storing data, e.g.,magnetic, magneto-optical disks, or optical disks. Information carrierssuitable for embodying computer program instructions and data includeall forms of non-volatile memory, including by way of examplesemiconductor memory devices, (e.g., EPROM, EEPROM, and flash memorydevices); magnetic disks, (e.g., internal hard disks or removabledisks); magneto-optical disks; and optical disks (e.g., CD and DVDdisks). The processor and the memory can be supplemented by, orincorporated in, special purpose logic circuitry.

To provide for interaction with a user, the subject matter describedherein can be implemented on a computer having a display device, e.g., aCRT (cathode ray tube) or LCD (liquid crystal display) monitor, fordisplaying information to the user and a keyboard and a pointing device,(e.g., a mouse or a trackball), by which the user can provide input tothe computer. Other kinds of devices can be used to provide forinteraction with a user as well. For example, feedback provided to theuser can be any form of sensory feedback, (e.g., visual feedback,auditory feedback, or tactile feedback), and input from the user can bereceived in any form, including acoustic, speech, or tactile input.

The techniques described herein can be implemented using one or moremodules. As used herein, the term “module” refers to computing software,firmware, hardware, and/or various combinations thereof. At a minimum,however, modules are not to be interpreted as software that is notimplemented on hardware, firmware, or recorded on a non-transitoryprocessor readable recordable storage medium. Indeed “module” is to beinterpreted to include at least some physical, non-transitory hardwaresuch as a part of a processor or computer. Two different modules canshare the same physical hardware (e.g., two different modules can usethe same processor and network interface). The modules described hereincan be combined, integrated, separated, and/or duplicated to supportvarious applications. Also, a function described herein as beingperformed at a particular module can be performed at one or more othermodules and/or by one or more other devices instead of or in addition tothe function performed at the particular module. Further, the modulescan be implemented across multiple devices and/or other components localor remote to one another. Additionally, the modules can be moved fromone device and added to another device, and/or can be included in bothdevices.

The subject matter described herein can be implemented in a computingsystem that includes a back-end component (e.g., a data server), amiddleware component (e.g., an application server), or a front-endcomponent (e.g., a client computer having a graphical user interface ora web browser through which a user can interact with an implementationof the subject matter described herein), or any combination of suchback-end, middleware, and front-end components. The components of thesystem can be interconnected by any form or medium of digital datacommunication, e.g., a communication network. Examples of communicationnetworks include a local area network (“LAN”) and a wide area network(“WAN”), e.g., the Internet.

The terms “a” or “an,” as used herein throughout the presentapplication, can be defined as one or more than one. Also, the use ofintroductory phrases such as “at least one” and “one or more” should notbe construed to imply that the introduction of another element by theindefinite articles “a” or “an” limits the corresponding element to onlyone such element. The same holds true for the use of definite articles.

It is to be understood that the disclosed subject matter is not limitedin its application to the details of construction and to thearrangements of the components set forth in the following description orillustrated in the drawings. The disclosed subject matter is capable ofother embodiments and of being practiced and carried out in variousways. Also, it is to be understood that the phraseology and terminologyemployed herein are for the purpose of description and should not beregarded as limiting.

As such, those skilled in the art will appreciate that the conception,upon which this disclosure is based, may readily be utilized as a basisfor the designing of other structures, methods, and systems for carryingout the several purposes of the disclosed subject matter. It isimportant, therefore, that the claims be regarded as including suchequivalent constructions insofar as they do not depart from the spiritand scope of the disclosed subject matter.

Although the disclosed subject matter has been described and illustratedin the foregoing exemplary embodiments, it is understood that thepresent disclosure has been made only by way of example, and thatnumerous changes in the details of implementation of the disclosedsubject matter may be made without departing from the spirit and scopeof the disclosed subject matter.

We claim:
 1. An apparatus configured to crowdsource information forprocessing domain information received from a data source, the apparatuscomprising: one or more interfaces configured to provide communicationwith a plurality of computing devices, wherein the plurality ofcomputing devices are operated by a plurality of software developers;and a processor, in communication with the one or more interfaces,configured to: receive information from a data source including a webpage and the web page containing domain information; generate a dataprocessing rule (DPR) module request requesting a DPR module, whereinthe request contains a specification of the DPR module and thespecification is tailored to process the domain information in the webpage; transmit the generated DPR module request to a computing device inthe plurality of computing devices operated by a software developer whohas knowledge of the domain information in the web page; receive a DPRmodule from the computing device, wherein the received DPR module isimplemented by the software developer using a programming languageaccording to the specification in the DPR module request, the receivedDPR module including rules representing the software developer'sknowledge of the domain information in the web page and the rules beingprogrammed into the DPR module using the programming language; andprocess the received DPR module including the rules representing thesoftware developer's knowledge of the domain information in the webpage, wherein processing the received DPR module allows the processor todetermine that the domain information in the web page has a particularmeaning.
 2. The apparatus of claim 1, wherein the domain information inthe web page includes a telephone number of a specific country and thespecification of the DPR module request requires that the DPR module tobe capable of parsing telephone numbers of the specific country anddetermining whether the telephone number of the specific country isassociated with a landline or a mobile device.
 3. The apparatus of claim2, wherein, in the step of transmitting the generated DPR modulerequest, the generated DPR module request is transmitted to a computingdevice in the plurality of computing devices operated by a softwaredeveloper who is familiar with telephone number formats of the specificcountry.
 4. The apparatus of claim 3, wherein the received DPR moduleincludes rules representing the software developer's familiarity withthe telephone number formats of the specific country and processing thereceived DPR module allows the processor to determine that the telephonenumber in the web page is associated with a landline or a mobile device.5. The apparatus of claim 1, wherein the processor is further configuredto receive a user query from a computing device and process the userquery using the DPR module.
 6. The apparatus of claim 1, wherein the DPRmodule is implemented using Java, Lisp, Clojure, JRuby, Scala, orJavaScript programming language.
 7. The apparatus of claim 6, whereinthe processor is further configured to generate additional (DPR) modulerequests requesting additional DPR modules, wherein each of theadditional requests contain a specification of each of the additionalDPR modules and the specification is tailored to process the domaininformation in the web page.
 8. The apparatus of claim 7, wherein theprocessor is further configured to receive the additional DPR modulesfrom one or more of the plurality of computing devices operated by thesoftware developers.
 9. The apparatus of claim 8, wherein the processoris further configured to group the received DPR modules into packages bytype of programing language used to implement the received DPR modules.10. The apparatus of claim 9, wherein the received DPR modules aregrouped into a first package containing DPR modules all implementedusing Clojure, a second package containing DPR modules all implementedusing Ruby, and a third package containing DPR modules all implementedusing JavaScript.
 11. The apparatus of claim 9, wherein the packageshave dependencies with each other and a DPR module in one of thepackages includes a subroutine that calls a DPR module in another one ofthe packages.
 12. The apparatus of claim 7, wherein the processor isfurther configured to process the additional DPR modules, whereinprocessing the additional DPR modules allow the processor to determinethat the domain information in the web page has other particularmeanings.
 13. The apparatus of claim 1, wherein the DPR module islocally tested on the computing device operated by the softwaredeveloper who has knowledge of the domain information in the web pagebefore the DPR module is received by the processor that the DPR modulesatisfies the specification of the DPR module request.
 14. The apparatusof claim 1, wherein the processor is further configured to communicatewith a resource and the received DPR module is configured to use theresource to provide a context-aware functionality.
 15. The apparatus ofclaim 14, wherein the processor is further configured to provide anapplication programming interface (API) to enable an external system touse the received DPR module.
 16. The apparatus of claim 15, wherein thereceived DPR module is configured to use the resource to determine aphysical location of the external system.
 17. The apparatus of claim 1,wherein the processor is configured to generate multiple DPR modulerequests requesting multiple DPR modules, wherein each of the requestscontains a specification for the corresponding DPR module, and thespecifications are the same and are tailored to process the same domaininformation in the web page.
 18. The apparatus of claim 17, transmit thegenerated multiple DPR module requests to multiple computing devices inthe plurality of computing devices operated by software developers whohave knowledge of the domain information in the web page, receivemultiple DPR modules from the multiple computing devices, and select aDPR module from the received multiple DPR modules that has the lowestcomputational time.
 19. A method of crowdsourcing information forprocessing domain information received from a data source, the methodcomprising: communicating, by one or more interfaces and a processor incommunication with the one or more interfaces, with a plurality ofcomputing devices, wherein the plurality of computing devices areoperated by a plurality of software developers; receiving, by theprocessor via the one or more interfaces, information from a data sourceincluding a web page and the web page containing domain information;generating, by the processor, a data processing rule (DPR) modulerequest requesting a DPR module, wherein the request contains aspecification of the DPR module and the specification is tailored toprocess the domain information in the web page; transmitting, by theprocessor via the one or more interfaces, the generated DPR modulerequest to a computing device in the plurality of computing devicesoperated by a software developer who has knowledge of the domaininformation in the web page; receiving, by the processor via the one ormore interfaces, a DPR module from the computing device, wherein thereceived DPR module is implemented by the software developer using aprogramming language according to the specification in the DPR modulerequest, the received DPR module including rules representing thesoftware developer's knowledge of the domain information in the web pageand the rules being programmed into the DPR module using the programminglanguage; and processing, by the processor, the received DPR moduleincluding the rules representing the software developer's knowledge ofthe domain information in the web page, wherein processing the receivedDPR module allows the processor to determine that the domain informationin the web page has a particular meaning.
 20. A non-transitorycomputer-readable medium having instructions executable by a dataprocessing apparatus including one or more interfaces and a processor incommunication with the one or more interfaces to: communicate, by theprocessor via the one or more interfaces, with a plurality of computingdevices, wherein the plurality of computing devices are operated by aplurality of software developers; receive, by the processor via the oneor more interfaces, information from a data source including a web pageand the web page containing domain information; generate, by theprocessor, a data processing rule (DPR) module request requesting a DPRmodule, wherein the request contains a specification of the DPR moduleand the specification is tailored to process the domain information inthe web page; transmit, by the processor via the one or more interfaces,the generated DPR module request to a computing device in the pluralityof computing devices operated by a software developer who has knowledgeof the domain information in the web page; receive, by the processor viathe one or more interfaces, a DPR module from the computing device,wherein the received DPR module is implemented by the software developerusing a programming language according to the specification in the DPRmodule request, the received DPR module including rules representing thesoftware developer's knowledge of the domain information in the web pageand the rules being programmed into the DPR module using the programminglanguage; and process, by the processor, the received DPR moduleincluding the rules representing the software developer's knowledge ofthe domain information in the web page, wherein processing the receivedDPR module allows the processor to determine that the domain informationin the web page has a particular meaning.